This disclosure relates in general to biographic record processing and, but not by way of limitation, to biographic record processing for datasets with biometric information.
There are datasets with redundant records. Duplicate records can be due to fraud or clerical errors. For example, a dataset with drivers license information could have biographic information on each license holder along with a photograph. Two licenses with different biographic information could have a picture of the same individual due to a clerical problem or fraud.
Other problems are created by individuals posing under multiple identities. A particular individual could have fabricated biographic information in two records that does not correlate or correlates weakly. Manual review of large datasets is unlikely to result in finding these duplicates. Even where photographs are part of the dataset, a human is not likely to notice two similar photos. Obscuring identity with disguises is likely to thwart any manual review.